Sense-aware Semantic Analysis: A Multi-prototype Word Representation Model using Wikipedia

نویسندگان

  • Zhaohui Wu
  • C. Lee Giles
چکیده

Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The “sense-specific” prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluation on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia

Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on W...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

Semantic Representations of Word Senses and Concepts

Representing the semantics of linguistic items in a machine-interpretable form has been a major goal of Natural Language Processing since its earliest days. Among the range of different linguistic items, words have attracted the most research attention. However, word representations have an important limitation: they conflate different meanings of a word into a single vector. Representations of...

متن کامل

NASARI: a Novel Approach to a Semantically-Aware Representation of Items

The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the ...

متن کامل

Wikipedia-based Compact Hierarchical Semantics for Natural Language Processing

A correct semantic representation of words and texts underlies many text processing tasks such as text categorization, word sense disambiguation, and semantic relatedness assessment. It has long been recognized that computers require access to common-sense and domain-specific world knowledge in order to process textual data at a deeper level. In this paper, we present a novel representation of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015